Categories

Versions

You are viewing the RapidMiner Studio documentation for version 10.1 - Check here for latest version

Performance (AUPRC) (Operator Toolbox)

Synopsis

This operator calculates the Area Under Precision Recall Curve (AUPRC) which is often considered to be a good alternative to traditional AUC measurements.

Description

This operator calculates the AUPRC for a binominal classification. The AUPRC is similar to the AUC, but while AUC measures the Area under the ROC (True Positive Rate vs False Positive Rate), AUPRC measures the Area under the Precision - Recall graph. While AUC is class balance independent, AUPRC is not. It is quoted that a method which dominates in AUPRC always dominates in AUC but not vice-versa.

For details see: http://mark.goadrich.com/articles/davisgoadrichcamera2.pdf

In addition also the accuracy and the AUC can be calculated by this operator. If a performance vector is provided at the performance input port, the new calculated performance criteria are added to the performance vector and the result is provided at the performance output port.

Input

  • lab (Data Table)

    This input port expects a labeled ExampleSet. The Apply Model operator is a good example of such operators that provide labeled data. Make sure that the ExampleSet has a label attribute and a prediction attribute. See the Set Role operator for more details regarding label and prediction roles of attribute. Also make sure that the label attribute of the ExampleSet is of binominal type i.e. label has only two possible values.

  • per (Performance Vector)

    This is an optional input. It requires a performance vector. The new performance vector will be appended to the input.

Output

  • per (Performance Vector)

    This port delivers a performance vector.

  • exa (Data Table)

    The ExampleSet that was given as input is passed without changing to the output through this port. This is usually used to reuse the same ExampleSet in further operators or to view the ExampleSet in the Results Workspace.

Parameters

  • main_criterion The main criterion is used for comparisons and needs to be specified only for processes where performance vectors are compared, e.g. attribute selection or other meta optimization process setups. If no main criterion is selected, the first criterion in the resulting performance vector will be assumed to be the main criterion. Range:
  • accuracy Relative number of correctly classified examples or in other words percentage of correct predictions. Range:
  • AUC Area under the ROC Curve. The ROC curve is the true positive rate vs the false positive rate. A perfect classifier has an AUC of 1.0 a random classifier has an AUC of 0.5. Range:
  • AUPRC Area under the PRC Curve. The PRC curve is the precision vs the recall. A perfect classifier has an AUPRC of 1.0 a random classifier has an AUPRC of 0.5. Range:
  • skip_undefined_labels

    When this parameter is true, examples not belonging to a defined class are ignored.

    Range:
  • use_example_weights

    If this parameter is true, example weights are used in the performance calculation.

    Range:

Tutorial Processes

AUPRC for SVM on Sonar